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A binding potential of mean force (BPMF) is a free energy of noncovalent association in which 
one binding partner is flexible and the other is rigid. Expanding on previous work with host- 
guest systems, I have developed a method to calculate BPMFs for protein-ligand systems. The 
method is based on replica exchange sampling from multiple thermodynamic states at different 
temperatures and protein-ligand interaction strengths. Protein-ligand interactions are represented 
by interpolating precomputed electrostatic and van der Waals grids. Using a simple estimator for 
thermodynamic length, thermodynamic states are initialized at approximately equal intervals. The 
method is demonstrated on the Astex diverse set, a database of 85 protein-ligand complexes rel¬ 
evant to pharmacy or agriculture. Fifteen independent simulations of each complex were started 
using poses from crystallography, docking, or the lowest-energy pose observed in the other simula¬ 
tions. Benchmark simulations completed within three days on a single processor. Overall, protocols 
initialized using the thermodynamic length estimator were system-specihc, robust, and led to ap¬ 
proximately even replica exchange acceptance probabilities between neighboring states. In most 
systems, the standard deviation of the BPMF converges to within 5 ksT. Even with low variance, 
however, the mean BPMF was sometimes dependent on starting conditions, implying inadequate 
sampling. Within the thermodynamic cycle, free energies estimated based on multiple interme¬ 
diate states were more precise, and those estimated by single-step perturbation were less precise. 
The results demonstrate that the method is promising, but that ligand pose sampling and phase 
space overlap can sometimes prevent precise BPMF estimation. The software used to perform these 
calculations. Alchemical Grid Dock (AlGDock), is available under the open-source MIT license at 
https://github.com/ccbatiit/algdock/. 


INTRODUCTION 

Fast and accurate predictions of binding free energies 
between proteins and small organic ligands would have 
significant impact on designing drugs [1-3] and other 
modulators of biological processes. The clear relevance 
of protein-ligand binding affinity prediction in chemical 
biology and drug discovery has inspired a vast array of 
physics-based methods (for a broad review, see [4]), each 
with a different trade-off between computational accu¬ 
racy and speed. 

On one extreme, molecular docking focuses on speed. 
Docking algorithms are designed to quickly obtain plau¬ 
sible configurations of a protein-ligand complex. Scoring 
functions are then used to rank one configuration versus 
another. Docking programs are commonly assessed by 
their ability to redock ligands into crystallographic struc¬ 
tures from which they have been removed. Comparative 
studies [5, 6] and blinded exercises [7] consistently show 
that docking methods are adept at generating the native 
pose but are less competent at giving it the highest rank. 
In this context, it is not surprising that docking scores are 
poorly correlated with binding free energies [5, 6, 8, 9]. 

In contrast, alchemical pathway methods are based on 
rigorous statistical mechanics. The methods involve sam¬ 
pling from a series of possibly nonphysical thermody¬ 
namic states in between end-states where the receptor 
and ligand are bound and unbound. In the unbound 


state, the receptor-ligand nonbonded interaction terms 
may be switched off or the species may be physically 
separated. In accordance to the established statistical 
mechanics of noncovalent binding [10], the receptor is 
usually allowed full flexibility. Unfortunately, sampling 
a fully flexible complex from multiple statistical distri¬ 
butions along an alchemical pathway generally requires 
substantial computing resources; it has been suggested 
that most published studies are not fully converged [llj! 
Even sampling the ligand binding pose is challenging 
[11]. To bypass this difficulty, most pathway calcula¬ 
tions pursue relative binding free energies between sim¬ 
ilar molecules and assume that the binding mode does 
not change. In absolnte binding free energy calculations, 
ligand sampling issues are usually alleviated by confin¬ 
ing the molecule to a specific pose [12, 13]. Within this 
restricted range of problems, alchemical pathway calcu¬ 
lations are amassing a growing track record of accurate 
prediction (e.g. [14-22]). The success of these methods 
suggests that docking may be substantially improved by 
incorporating rigorous statistical mechanics. 

Implicit ligand theory (ILT) [23], a recently derived 
statistical mechanics framework for noncovalent associa¬ 
tion, has the potential to inspire new methods that com¬ 
bine the speed of docking and the rigor of alchemical 
pathway methods. ILT formally separates receptor and 
ligand sampling into two distinct stages. Because the 
first stage of receptor sampling does not require a lig- 



2 


and, receptor conformations can be sampled once and 
used with many different ligands. In contrast, conven¬ 
tional alchemical pathway methods require thorough re¬ 
ceptor sampling for every receptor-ligand pair. The sec¬ 
ond stage is to calculate the binding potential of mean 
force (BPMF) - the binding free energy between a ligand 
and a rigid receptor configuration (Eq. 2) - for each re¬ 
ceptor configuration. Finally, the standard binding free 
energy is an exponential average of BPMFs (Eq. 3). 

While multiple BPMFs are required to estimate a stan¬ 
dard binding free energy, the overall calculation should 
require less computer time than conventional methods 
because BPMFs are easier to estimate than binding free 
energies with a fully flexible receptor. The key reason 
for this speedup is that a BPMF calculation only re¬ 
quires sampling of the ligand, which usually has many 
fewer degrees of freedom than the complex. Furthermore, 
nonbonded interactions between a rigid receptor and lig¬ 
and can be treated by interpolating precomputed three- 
dimensional grids, a strategy hrst developed for docking 
[24]. Once the grid is stored, calculation time no longer 
depends on the size of the receptor. In contrast, con¬ 
ventional alchemical pathway methods require frequent 
force evaluation between flexible receptor atoms. As the 
number of pairwise interactions scales as 0{N‘^) with N 
receptor atoms (neglecting cutoffs), the relative efficiency 
of ILT-based methods will be more pronounced as recep¬ 
tor size increases. 

In the first paper on ILT, BPMFs were estimated for 
simple host-guest systems [23]. Here, the focus is on pre¬ 
cise BPMF estimation for protein-ligand systems. Mob¬ 
ley et al. [14], Ucisik et al. [25] previously computed 
binding free energies between simple ligands and a rigid 
protein, T4 lysozyme. In contrast, the present work in¬ 
volves more diverse systems. Mobley et al. [14] were 
equally rigorous, but did not implement specialized meth¬ 
ods to signihcantly speed their calculations compared to 
flexible-receptor calculations. Ucisik et al. [25] developed 
a fast method based on a number of approximations. As 
in other BPMF [23] and standard binding free energy 
[22, 26, 27] calculations, Hamiltonian replica exchange is 
applied. The main methodological differences between 
the current and previous work are the adaptive initial¬ 
ization of thermodynamic states and the use of linearly- 
scaled interaction grids [24] with transformation-based 
smoothing [28]; these will be discussed in detail in the 
section on Theory and Methods. The method is demon¬ 
strated on the Astex diverse set [29], a curated database 
of 85 high-quality protein-ligand complexes of pharma¬ 
ceutical or agrochemical interest. 

The algorithm is implemented in a new software pack¬ 
age, Alchemical Grid Dock (AlGDock), a python module 
based on the Molecular Modeling Toolkit (MMTK) 2.7.8 
[30]. AlGDock is available under the open-source MIT 
license at https://github.com/ccbatiit/algdock/. 


THEORY AND METHODS 

This section reviews implicit ligand theory, details the 
algorithms in AlGDock, and describes the setup of the 
BPMF calculations on the Astex diverse set. 


Implicit Ligand Theory 


For the noncovalent association between a receptor R 
and ligand L to form a complex RL, R + L ^ RL, the 
standard binding free energy is, 

( 1 ) 

where /3 = {kBT)~^ is the inverse of Boltzmann’s con¬ 
stant times the temperature, C° is the standard concen¬ 
tration (1 M = 1/1660 A^), and Cx is the equilibrium 
concentration of species X G {R, L, RL}. Activities have 
been assumed to be unity, a reasonable approximation in 
the limit of low concentrations. 

Goordinates of the complex, r^L, are partitioned into 
receptor (r^j) and ligand internal (r^,) and external 
coordinates. Based on this partitioning, the interaction 
energy is defined as 4'(rflL) = U{rRL) -U{rR) -U{rL), 
where U{r) = U{r) + W{r) is an effective potential en¬ 
ergy that includes the gas-phase potential energy U{r) 
and solvation free energy W{r) [10]. A BPMF is an ex¬ 
ponential average of interaction energies over ligand co¬ 
ordinates in the binding site [23], 


^ 1 J drLd^L 


where = I{^l) is an indicator function that takes val¬ 
ues between 0 and 1 and specifies whether the receptor 
and ligand are bound or not. 

According to ILT, the standard binding free energy 
AG° is related to an exponential average of BPMFs over 
Boltzmann-distributed receptor configurations rR, 


AG° 


-1 

^ / e~h^^’^n'idrR J 



where D = / I^{^L)d^L is the volume of the binding site. 


Thermodynamic Cycle 


A BPMF can also be expressed as a ratio of partition 
functions [23], 


B{rR) 


f f drRdg \ 

y / I^e-hl^GG+!^(rji)] drRd^L J 


•( 4 ) 


In AlGDock, BPMFs are estimated by completing the 
thermodynamic cycle shown in Figure 1. 
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FIG. 1. Thermodynamic cycle for BPMFs: Schematic. Milestone thermodynamic states are labeled with letters in 
parentheses. Counter-clockwise from the top left: the non-interacting receptor and ligand are desolvated (a&b), the temperature 
is increased to 600 K (b&c), receptor-ligand interaction grids are scaled from 0 to 1 and the temperature is decreased to 300 
K (c&d), and the complex is solvated and grid energies replaced with direct calculations (d&e). Arrows with orthogonal lines 
indicate multiple intermediate thermodynamic states. For BPMF calculations, configurations are sampled from thermodynamic 
states with the rounded boxes and from their intermediates. 

Mathematical expressions for individual free energy differences within the cycle are given in Table I. The sum of 
all the free energy components in Table I is PBirn). 


TABLE I. Thermodynamic cycle for BPMFs: Expressions 


Milestones 


Process 


a&b 

Receptor desolvation 

a&b 

Ligand desolvation 

b&c 

Receptor heating 

b&c 

Ligand heating 

c&d 

Grid scaling/cooling 

d&e 

Gomplex solvation 


Expression 


Jab,R — m 

Jab.L - J j^^-fiTbHr-L)drr^d^L 

r _ I Ie.<=~^HU<.^L)drLdtiL 

Jbc,L- f i^^-fiTUlr-^)drLdiL 

f Ife-f^TbKrRL'idrRdiL 


fde = - In 


/ I^e-f^TVg<.rRL)drLd(R 


Expressions for free energy differences from the thermodynamic cycle in Figure 1. The target temperature Tt = 300 K is the 
basis of /St = {kBTT)~^■ Similarly, the high temperature Th = 600 K is the basis of Ph = {kBTH)~^■ 


Over the course of the cycle, the receptor-ligand in¬ 
teraction strength is scaled and the temperature is var¬ 
ied. The temperature is varied because high-temperature 
states enhance transitions between local energetic min¬ 
ima. Because the protocol involves temperature changes, 
the notation is simplihed by using a lower-case u to 
refer to the reduced potential energy [31], a log prob¬ 
ability density that incorporates /3. For example, at 
milestone c, the reduced potential energy is = 

ldH[U{rL) + U{rii)]. Likewise, the reduced free energy 
difference between two milestones x and y is f^y- Be¬ 


cause converting from reduced to standard potential en¬ 
ergies and free energies involves dividing by /3, the units 
of these reduced quantities are specified as ksT. 

For thermodynamic states that require /^, the ligand is 
confined to the binding site using a flat-bottom harmonic 
potential [23, 26, 27], 


ui{d) = 


0 if d < do 

\Pk{d—do)'^ if d > do 


( 5 ) 


where k = 10000 kJ/(mol nm^) is the spring constant, d 
is the distance between the ligand center of mass and the 
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center of the binding site, and do = 6.0 A is the radius 
of the binding site. There is no restriction on ligand 
rotation. 

At milestone d, the reduced potential energy is, 

UgirRL) = Pt PirL) + U{rR) + '^g{rRL )\, ( 6 ) 

where '^g(rRR) is the grid interaction energy. The grid 
interaction energy, 

'^gi'TRL) = '^PBSAirRh) + 'i!vdw{rRL), (7) 

is based on one electrostatic and two van der Waals grids. 

The electrostatic interaction energy ^pbsa{'<’rl) is 
evaluated by multiplying atomic partial charges with the 
electrostatic potential. The electrostatic potential at 
each ligand position is obtained by trilinear interpolation 
of a precomputed grid. The grid is produced by solv¬ 
ing the linear Poisson-Boltzmann equation around the 
minimized receptor molecule using APBS 1.4 [32] with 
sequential focusing. Coarse grids are at least 1.5 times 
larger than the range of the receptor molecule in each 
dimension. Fine grids have the same size as the van der 
Waals grids, and a spacing of 0.5 A. Coarse grids use mul¬ 
tiple Debye-Huckel boundary conditions, and fine grids 
use coarse-grid solutions as boundary conditions. Both 
grids are solved with the following options: a quintic B- 
spline charge discretization, spline window width of 0.3, 
protein dielectric of 2.0, solvent dielectric of 80.0, solvent 
density of 10.0, solvent radius of 1.4 A, smoothed dielec¬ 
tric and ion-accessibility coefficients, and temperature of 
300.0 K. 

The van der Waals energy '^vdwifRh) is evaluated by 
an analogous grid-based procedure pioneered by Meng 
et al. [24]. In the AMBER [33] molecular mechanics 
force field, the receptor-ligand van der Waals interaction 

energy is represented as, ^ > a 

double sum over ligand atoms i € 1,..., Nug and re¬ 
ceptor atoms j € l,...,Nrec- Pj and Bij are the re¬ 
pulsion and attraction parameters and is the dis¬ 
tance between atoms i and j. Molecular docking pro¬ 
grams usually model these interactions by a geometric 
mean approximation [24], such that Aij = ^An pAjj 
and Bij = \JBu yjBjj. With this approximation, the 
receptor-ligand van der Waals interaction energy at a 
point k is. 



The values Ofc and bk are precomputed on a three- 
dimensional grid with a spacing of 0.25 A. Grids span 


at least 10 -I- lAdmax A in each dimension surrounding 
the ligand center of mass in the crystallographic binding 
mode, where dmax is the maximum distance from any 
ligand atom to its center of mass. For energy and force 
evaluations, Ofc and bk are approximated at actual ligand 
atom positions by trilinear interpolation. 

Because van der Waals potentials are highly nonlinear, 
straightforward trilinear interpolation of van der Waals 
grids provides energies that poorly match directly cal¬ 
culated energies [34]. Hence, van der Waals repulsive 
grid energies are calculated using a transformation, trilin¬ 
ear interpolation, and inverse transformation procedure 
[28]: If /(xi) is the van der Waals attractive or repul¬ 
sive energy at x^, a position vector on a grid corner, then 
9pi) = is evaluated for all grid corners. To 

estimate /(x) at a desired position vector x, g(x) is ob¬ 
tained by trilinear interpolation of g{xi) for the eight Xi 
surrounding x, and f(x) = g{x)~^. Previously, when 
equivalent powers were used for the van der Waals re¬ 
pulsive and attractive grids, m = 2 was found to dra¬ 
matically reduce interpolation error [28]. Here, m = 2 is 
used for the repulsive grid and no transformation for the 
attractive grid; this combination was found to yield the 
most accurate van der Waals energies for docked poses. 

Alchemical transformations that modulate the 
strength of interactions between atoms often face an 
“end-point catastrophe” where free energy changes are 
numerically unstable [1]. To circumvent this issue, a 
set of soft Lennard-Jones repulsive and electrostatic 
grids is introduced between milestones c&d. In these 
soft grids, the original grid value, Vo, is replaced with 
Vmax tanh (vo/vmax)- (Gallicchio and Levy [26] also used 
a hyperbolic tangent energy cap.) For the soft Lennard- 
Jones repulsive grid, Vmax = 10.0 kJ^/^ moH^/^. To 
prevent ligands from being “pinned” by electrostatics, 
the soft electrostatic grid uses as maximum value such 
that the electrostatic energy is less than or equal to 
the soft Lennard-Jones repulsive energy for every heavy 
atom at every grid point. This is established by setting 
Vmax for the electrostatic grid to 10 times the minimum 
ratio of Lennard-Jones and electrostatic scaling factors. 
The reduced potential energy is switched according to 
the protocol, 

= +asgia)'lisgirRL) + ag{a)'iig{rRL)] (II) 
ctsgia) = -{2a- I)^ -I- 1 

^ = (2a-if 

' l-bexp[-I000(a-1)] 

T{a) = (Tt — 7A)a J- Tr 

This protocol turns on the soft grids first, and then the 
unperturbed grids (Figure 2). The potential is consistent 
with milestone c at a = 0 and milestone d at a = 1. 
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FIG. 2. Grid scaling between milestones c&d. Usg 

(dashed line) and Og (solid line) as a function of the progress 
variable a. 

Sampling and Estimation 

The sampling and estimation strategy in AlGDock is 
summarized as follows: 

1. Ligand preparation: the ligand is minimized 
with 500 conjugate gradient steps. The temper¬ 
ature is ramped from 20 K to 300 K over 30 geo¬ 
metrically spaced simulations of 500 steps each. 

2. Thermodynamic state initialization: starting 
from 50 seed configurations, simulations of 1000 
steps were used to initialize each thermodynamic 
state between milestones b&c and c&d. 

3. Hamiltonian replica exchange: simulations 
were conducted between milestones b&c and sub¬ 
sequently between c&d. 

4. Postprocessing: samples from milestones b 
and d were postprocessed using the generalized 
Born/surface area (GBSA) water model [35]. For 
milestone e, grids were replaced by the rigid re¬ 
ceptor configuration and pairwise interactions were 
directly computed. 

5. Estimation: Free energy differences that sum up 
to the BPMF are estimated. 

During thermodynamic state initialization and Hamil¬ 
tonian replica exchange, the No-U-Turn Sampler (NUTS) 
[36] is used to propagate the ligand from one conhg- 
uration to the next. NUTS is a hybrid molecular dy¬ 
namics (MD)/Markov chain Monte Garlo (MGMG) sam¬ 
pling technique [37] that eliminates the need to select 
the MD trajectory length between Monte Garlo trials. 
In other hybrid MD/MGMG techniques, the trajectory 
length is an adjustable parameter that must be carefully 
tuned. NUTS automatically truncates MD trajectories 
when they make a “U-Turn” that brings trajectory end 


points closer to one another, or when energy changes are 
too large. In the present implementation, the change in 
potential energy or total energy (potential and kinetic en¬ 
ergy) is required to be less than 125 kJ/mol. The imple¬ 
mentation also truncates trajectories so that the allotted 
number of MD integration steps is not exceeded. 

Another adjustable parameter, the MD time step, is 
selected using dual averaging - an extension to NUTS 
[36]. Dual averaging adjusts the time step such that an 
acceptance statistic Ht becomes close to <5. Parameters 
for dual averaging were S = 0.6, 7 = 0.05, to = 10, 
K = 0.75, and ^ = In(lOAt). Because adjusting the time 
step disturbs the equilibrium distribution, dual averaging 
is only used during initialization. 

To accelerate transitions between binding poses, ex¬ 
ternal coordinate Markov chain Monte Garlo moves are 
attempted for thermodynamic states between milestones 
c&d for a < 0.01. The external coordinate move consists 
of: 

1. Random rotation. A random quaternion is con¬ 
verted into a matrix that is used to rotate the 
molecule about its center of mass. 

2. Random translation. The magnitude of translation 
in each dimension is drawn from a Gaussian distri¬ 
bution with a standard deviation of 0.6 A. 

The move is accepted or rejected according to the 
Metropolis criterion. Moves are not attempted for a > 
0.01 due to low acceptance probability. 

Thermodynamic state initialization 

For milestones b&c, the first thermodynamic state 
{k = 0) is at 300 K and the ligand is steadily warmed 
to 600 K. The first thermodynamic state between mile¬ 
stones c&d depends on whether there is a fully-bound 
pose available in the binding site. If a pose is available, 
then the first state is fully bound at 300 K. Otherwise, the 
first state is fully unbound at 600 K. In the latter situa¬ 
tion, 50 randomly selected configurations from milestone 
c are placed in the binding site at 453 random center-of- 
mass positions (0.5 positions per A^) and rotated with 
100 different random orientations. 

After state k is initialized, state fc -|- 1 is initialized as 
follows: 

1. Parameter selection: Parameters for state k + 
1 are selected using a new algorithm designed to 
separate states at approximately even intervals in 
thermodynamic length. 

Thermodynamic length is a metric of the distance 
on the manifold of thermodynamic states [38]. For 
a sequence of states, the statistical error in free 
energy calculations is minimized and the replica 
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exchange frequency is nearly maximized when the 
intermediate states are equidistant in thermody¬ 
namic length [39]. Suppose that thermodynamic 
parameters (e.g. temperature, pressure, grid scal¬ 
ing strengths) are specified by a vector A with com¬ 
ponents A*. Let 7 = 7 ( 0 ) describe the dependence 
of A on the variable a, such that 7 ( 0 ) is the ini¬ 
tial and 7 ( 1 ) is the final thermodynamic state. For 
microscopic systems, the thermodynamic length is 
defined by the path integral [39, 40], 



Given a parameter vector A, the reduced potential 
energy is u\{x) = U\{x)/{kBTx), where U\{x) is 
the effective potential energy and T\ is the temper¬ 
ature. The normalized log probability of observing 
a configuration x is l\(x) = —u\(x) — hiZ\, where 
Zx = J is the partition function. These 

quantities are used to dehne elements of the Fisher 
information matrix. 


9il)ij = crl[d"l\,dHx] , (13) 

where is the covariance in state A and 9® denotes 
a partial derivative with respect to A®. For a proto¬ 
col in which only one parameter A® varies with a, 
the length is, £ = ^ax [dHx] dt. 

Numerical estimates of £ are most accurate when 
samples are drawn from many states between 0 < 
a < 1 [40]. Such exhaustive sampling, however, is 
unavailable during initialization. A simple approxi¬ 
mation for the thermodynamic length when one pa¬ 
rameter changes is, £ = AA® cto [d^h ], where AA® 
is the total change in the value of the parameter A® 
and (Jq [9®G] is a standard deviation in the initial 
state. Thus, if one desires £ to be approximately 
constant between different intermediate stages in a 
protocol, then the change in parameter should be 
inversely proportional to (Tq , 


AA® = 


s 

O-Q [9®/a] ’ 


(14) 


where s is an adjustable parameter, the thermody¬ 
namic speed. 

Between milestones b&zc, the parameter that varies 
is the temperature T. As the log probability of a 
ligand conhguration is lx = — ~ T is 

incremented by. 


AA® = - 


SbckT"^ 


(15) 


where Sbc = 0.1. Between milestones c&d, the log 
probability of r^L is lx = —UairnL) ~ IhZa. a is 
incremented by, 


AA® = Scd 


da 


sg 


da 


ax [4® 


sgJ 


ksT^a) 


-f 


da„ 


da 


+ \Tt-Th\ 


ax[Ua 


O'Al^g] 
kBT{a) 
-1 


(rRL)] 


T{a) 


(16) 


with Scd = 0 . 2 . 

If the targeted value of the parameter is exceeded 
(e.g. temperature increased above 600 K), then the 
targeted value is used. 

2. Seed selection: 50 configurations from state k are 
resampled as starting seeds for simulations in state 
fc -I- 1. 

Configurations are drawn from state k with weights 
proportional to exp[Mfc(xi) — Uk+i{xi)], where 
Ukix) = Uk{x)/(kBTk) is the reduced potential 
energy in state k. In the limit of inhnite sam¬ 
pling of state k, resampled configurations would be 
Boltzmann-distributed in state k -\-l. With imper¬ 
fect sampling of state k, resampled configurations 
approximate the Boltzmann distribution in state 
fc -I- 1. 

3. Sampling: Simulations of 1000 steps are run from 
each seed. 

NUTS is run with dual averaging to adjust the time 
step. Each simulation of 1000 steps has its own es¬ 
timate of the ideal time step. The median time 
step, restricted to between 0.25 and 2.5 fs, is sub¬ 
sequently used in Hamiltonian replica exchange. 

4. Verification: The mean replica exchange proba¬ 
bility, (pacc)) is used to verify that thermodynamic 
states are not too distinct nor similar. 

It is estimated by taking the sample mean of Pacc 
(Eq. 17) for every pair of initial samples (at the 
same time index) from states k and k-\-l. If (pacc) 
is estimated to be too low (below 0.3), then param¬ 
eters for state k 1 are reselected with a smaller 
increment (thermodynamic speed is adjusted by a 
factor of 4/5) and simulations are repeated. If it is 
too high (above 0.99), then state k is removed. 


Hamiltonian replica exchange 

Initialization is followed by Hamiltonian replica ex¬ 
change [23, 26, 27, 41] between milestones b&c (15 cy¬ 
cles) and c&d (18 cycles). Replica exchange is a Markov 
chain Monte Carlo move that swaps the configurations 
of a pair of simulations at different thermodynamic 
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states. (Equivalently, it may be regarded as swapping 
the states.) Consider the thermodynamic states a and 
b with reduced energies Ua and Ub, respectively. If x is 
the original configuration in state a and y the original 
configuration in state 6, then the acceptance probability, 


Pace = min 


1 „-Ua{y)-1J.b{x) + Ua{x) + Ub{y) 

-L, C , 


(17) 


preserves the Boltzmann distribution in both states. 

Typical replica exchange protocols attempt exchanges 
between pairs of neighboring thermodynamic states, but 
this restriction is unnecessary. As replica exchange is a 
type of Gibbs sampling [42], an arbitrary number of at¬ 
tempts can be made between arbitrary pairs of states. In 
the present simulations, each replica exchange cycle con¬ 
sists of 1000 iterations with 50 NUTS steps, 20 external 
coordinate MCMC moves (if appropriate), and a sweep 
of replica exchange. Each sweep of replica exchange in¬ 
cludes attempts to swap configurations between pairs of 
states that are 1, 2, ...,min(5, AT) states apart, where K 
is the total number of thermodynamic states in the di¬ 
rection. 

The statistical inefficiency g is estimated from the inte¬ 
grated autocorrelation time of replica exchange indices, 
as described [43]. Multiple samples (up to 3 between 
milestones b&c and 25 between milestones c&d) were 
saved for every g replica exchange sweeps. 


Estimation 


BPMFs were estimated according to, 

— fab^R 4 “ fah^L 4 “ fhc^L 4 - fed 4 - fde- ( 1 ^) 


This expression replaces the sum fbc,R 4- fcd,o with. 


fed = - In 




(19) 


which can be evaluated without determining the receptor 
internal energy U{rp). The receptor desolvation free en¬ 
ergy is estimated by the difference, fab,R = h{U{rR) - 
U{rR)). 

Other free energy differences are estimated based on 
equilibrated samples from replica exchange between mile¬ 
stones b&c or c&d. Equilibration is determined by cal¬ 
culating the mean and standard deviation of the energy 
in milestones b or d, respectively. Systems are consid¬ 
ered to be equilibrated once the mean energy is within a 
standard deviation of the mean energy in the last cycle. 
fab,R and fde are estimated by free energy perturbation 
[44] using configurations drawn from milestones a and d, 
respectively. fbc,L and fed are estimated by the multi¬ 
state Bennett acceptance ratio [31], which uses potential 
energies from every replica. 


Astex diverse set BPMF calculations 

The Astex diverse set is a subset of the protein data 
bank carefully curated for quality and diversity [29]. The 
members of the set will be referred to be their protein 
data bank identifiers, e.g. Is3v. Each system in the 
set was set up using AmberTools 14 [45]. Proteins and 
ions (from protein.mol2) were parameterized using the 
AMBER ffl4SB force field. Zinc parameters were also 
used for iron, mercury, and magnesium. Non-standard 
amino acid residues were converted to their standard 
counterparts. Using the python wrapper to the Open- 
Eye OEChem Toolkit, all crystallographic ligands were 
parameterized with the Generalized Amber Force Field 
[46] and AMIBCG charges [47, 48]. The main ligands 
from ligand.mol were regenerated to lose memory of the 
crystallographic pose. Noncrystallographic atoms in each 
receptor were minimized for 2500 steps in gas and 2500 
steps in GBSA solvent using NAMD 2.9 [49]. 

For each system, 15 independent simulations were per¬ 
formed between milestones b&c. Subsequently, simula¬ 
tions were continued between milestones c&d from three 
different starting points: crystallography, docking, or 
the lowest-energy pose (according to Ug) sampled in all 
the other simulations. In most situations, starting from 
docked poses will be the most practical approach, but 
the others are a useful comparison. 

Docked poses were obtained using the anchor and 
grow algorithm in UCSF DOCK 6 [50] to redock lig¬ 
ands into their respective proteins. First, sphgen was 
run with default parameters: dotlim = 0.0, radmax = 
4.0 A, and radmin = 1.4 A. DOCK was run with 20000 
maximum orientations, using internal energy with an ex¬ 
ponent of 12, a flexible ligand, an a minimum anchor 
size of 40. Pruning was performed with clustering, 1000 
maximum orientations, a clustering cutoff of 1000, and 
conformer score cutoff of 25.0. A bump filter was used 
with max_bumps_anchor = 12 and max_bumps_growth = 
12. Final conformations were clustered with a root mean 
square deviation (RMSD) threshold of 2.0 A. For two 
systems, lt46 and ln46, no docked poses in the binding 
site were obtained. 

After the starting poses were minimized for 1000 conju¬ 
gate gradient steps in Ug, the lowest-energy pose was used 
to initialize milestone d. After the thermodynamic states 
were initialized, docked poses within the binding site were 
also used as starting points for each state in replica ex¬ 
change. The lowest-energy pose was used in milestone d. 
Higher-energy poses were used in intermediate replicas 
to fill all available thermodynamic states. If there were 
more states than docked poses, the lowest-energy pose 
was duplicated. For simulations starting from a single 
pose, replicas were not reinitialized at the beginning of 
replica exchange. 

Calculations were performed on the Open Science Grid 



[51], Duke shared computing resources, or the Minh 
group computing cluster at IIT. Each node on the Minh 
group cluster includes dual AMD Opteron 6376 2.30GHz 
16MB cache sixteen-core processors and 64GB DDR3 
1600 EGG registered memory. Benchmark calculations 
were run on the Minh group cluster. 

To quantify the precision of the results, we consider 
the root second moment about a reference value Xr, 


a[x, Xr] 






Xr) , 


( 20 ) 


for N samples indexed by a;„. When Xr is the mean, then 
Equation 20 is the standard deviation, which will simply 
be referred to as a[x\. We also consider the root second 
moment about the minimum, a[x^Xmin]^ and about the 
sample mean at the end of a simulation, a[x,Xend\- 


RESULTS 

Thermodynamic state initialization 

Protocol length 



Fraction of systems 


Overall, the present method for thermodynamic state 
initialization robustly yields protocols that are adapted 
to specific systems. The large range in the number 
of thermodynamic states. Nutates, provides evidence of 
system-specific adaptation (Figure 3). Between mile¬ 
stones b&c, the average number of states, Nstates ranges 
from 35.1 to 93.6. Between milestones c&d for sim¬ 
ulations initialized from the crystallographic pose, the 
Nstates ranges from 84.6 to 190.1. For most systems, the 
standard deviation of the number of states, <j[N states] 
is small relative to the Nstates, indicating that the pro¬ 
cedure is robust. The robustness and apparent system- 
specificity of the protocols implies that a universal ther¬ 
modynamic protocol may strongly deviate from optimal¬ 
ity, e.g. (pacc) may be inconsistent across replicas. 

In some systems, however, Nstates is not robust; 
Nstates is dependent on the starting pose and/or 
cr[Nstates] is large. Protocols will vary when the initial¬ 
ization stage samples from distinct local minima and es¬ 
timates different thermodynamic lengths. Between mile¬ 
stones b&c, the a[N states] is larger than 2 in three sets 
of simulations: ltz8 (6.1), Irlh (7.3), and Ivcj (13.6). 
The larger variance in protocol lengths occurs for differ¬ 
ent reasons (Figure 4). In Irlh simulations, the protocols 
require variable number of states to reach 400 K, after 
which the slope is similar. For ltz8 and Ivcj, there are 
two classes of protocols, indicative of sampling from dis¬ 
tinct local minima. With ltz8, the two protocols are ob¬ 
tained with approximately equal probability. With Ivcj, 
the short protocol was observed in only 2 of 15 simula¬ 
tions. 


FIG. 3. Number of thermodynamic states (a) between 
milestones b&c, and (b) between milestones c&d. The marker 
indicates the mean value and error bars the standard de¬ 
viation of 15 independent simulations. Between milestones 
c&d, simulations are initialized from the crystallographic 
(x), lowest-energy docked (circles), or lowest-energy observed 
(downward-facing triangles) pose. They are ordered by the 
mean number of states for simulations initialized from the 
crystallographic pose. 

Protocols between milestones c&d are more variable 
than those between milestones b&c. Interaction grids 
introduce more possibilities for trapping in local minima. 
For all three starting points, 117f has the largest variance 
in Nstates- The protocols are most similar for the first 40 
thermodynamic states and the last 10, but take different 
paths in between (Figure 5). It is worth noting, however, 
that high variation in protocols does not necessarily lead 
to inaccurate results, as will be explained in the following 
sections. 


Time steps 

In the vast majority of initialization processes, dual 
averaging converged to a time step around 1.5 fs (Figure 
6 and Table II). This is true both between milestones 
b&c and c&d, suggesting that the grid does not signifi¬ 
cantly affect the time step in the vast majority of cases. 
As a notable exception, the shortest time steps appear 
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FIG. 4. Protocols between milestones b&c for 15 inde¬ 
pendent simulations of the ligand from (a) ltz8, (b) Irlh, and 
(c) Ivcj. 


around a = 0.5 between milestones c&d in simulations 
of 2bm2 and ln46. In ln46, there is a steady decrease in 
time step away from the end points. Because simulations 
of ln46 did not start with a docked pose, they are more 
likely to occupy metastable minima in the grid. Near 
0 = 0, the grid has little influence. Near a = 1, simula¬ 
tions are dominated by a smaller number of more stable 
poses. The longest time steps are observed with IjdO, 
where most simulations converged to time steps between 
1.8 and 2 fs. The ligand in IjdO is notable for being small, 
relatively rigid, and possessing few hydrogen atoms (5). 
The small number of hydrogen atoms is relevant because 
unconstrained hydrogen vibrations often limit the maxi¬ 
mal time step of molecular dynamics integrators. 

TABLE II. Time step statistics 


Milestones 

Mean 

Standard Deviation 

Min 

Max 

b&c 

1.58 

0.11 

1.27 

2.09 

c&d 

1.55 

0.12 

0.87 

2.08 


Statistics for time steps (in fs) from dual averaging, based 
on all protocols from all systems. 


FIG. 5. Protocols between milestones c&d for 15 inde¬ 
pendent simulations of the complex from 117f, starting from 
the crystallographic (a and b), docked (c and d), and lowest- 
energy observed (e and f) poses. The left panel shows the 
protocol on a linear scale for a > 0.01, indexed such that the 
first state is at a = 1.0. The right panel shows the protocol 
on a semilog scale for a < 0.01, indexed such that the first 
state is at a = 0. 


Replica exchange acceptance probabilities 

A good replica exchange protocol has a reasonable 
(Pacc) between all neighboring states. The key benefit 
of replica exchange is to spread sampled configurations 
across a range of different thermodynamic states. A bot¬ 
tleneck in the exchange of configurations across the pair 
of states can eliminate this benefit of replica exchange; 
the simulations effectively become two independent sets 
of simulations. Low exchange probabilities are also in¬ 
dicative of poor phase space overlap, which can limit the 
convergence of free energy estimates [52]. 

Because of its importance to efficient simulation, Al- 
GDock includes an estimate of (pacc) to verify new ther¬ 
modynamic states during the initialization step. (The no¬ 
tation pace is used to refer to an estimator for (pacc))- In 
some simulations, however, the phase space explored dur¬ 
ing replica exchange can be distinct from that explored 
during initialization, leading to a substantial change in 
the observed Pacc- Thus, thermodynamic state verifica¬ 
tion was followed up by estimating (pacc) based on equi- 
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FIG. 6. Time step statistics. Statistics for time steps from 
dual averaging, based on all protocols from all systems. His¬ 
tograms of time steps between 1.0 and 2.0 fs for (a) between 
milestones b&c and (c) between milestones c&d. The largest 
bin count is 10220 between milestones b&c and 63514 be¬ 
tween milestones c&d. Five protocols with the (b) shortest 
and (d) longest observed time steps between milestones c&d. 
The shortest time steps are observed in 2bm2 (x) and ln46 
(other markers). The longest time steps are observed in IjdO. 


librated samples during replica exchange. 

Estimates of ipacc) indicate that the initialization pro¬ 
tocol does not lead to any simulations with replica ex¬ 
change bottlenecks (Figure 7 and Table III). Between 
milestones b&c, pace estimated during replica exchange 
are high and have low variance. This outcome is consis¬ 
tent with achieving the goal of nearly equal thermody¬ 
namic length between adjacent thermodynamic states. It 
also implies that the configuration spaces explored during 
initialization and replica exchange are largely the same. 
Between milestones c&d, the replica exchange rates are 
also high but there is a larger variance. While most 
Pace are between 0.7 and 1.0, there are a few simulations 
where pace is much lower; the lowest observed values are 
during simulations of 2bm2 (0.21) and lgm8 (0.28). In 
these simulations, the acceptance probability dips around 
a = 0.75, but are high for most other values of a. These 
drops indicate that different conformations are explored 
in replica exchange compared to during thermodynamic 
state initialization. However, even these low pace are not 
low enough to be considered a bottleneck; they are ex¬ 
pected to allow configurations to pass through the ther- 


FIG. 7. Mean acceptance probability statistics. Statis¬ 
tics for mean acceptance probabilities from replica exchange, 
based on all simulations. Histograms of pace between 0.7 and 
1.0 for (a) between milestones b&c and (c) between milestones 
c&d. The largest bin count is 35785 between milestones b&c 
and 63592 between milestones c&d. pace for fifteen protocols 
with the lowest observed pace (b) between milestones b&c and 
(d) and between milestones c&d are shown with the a line con¬ 
necting neighboring states. Between milestones b&c, the tem- 
peratnre increases with a snch that T{a.) = (Th — TT)a + TT- 
The simulations are from ls3v (7), 2bm2 (3), lr55 (3), lz95 
(1), and lv48 (1). Between milestones c&d, the simnlations 
are from lv48 (10), lt40 (3), 2bm2 (1), and lgm8 (1). 


modynamic states several times per cycle. 

TABLE III. Mean acceptance probability statistics 


Milestones 

Mean 

Standard Deviation 

Minimum 

b&c 

0.94 

0.010 

0.84 

c&d 

0.91 

0.039 

0.21 


Statistics for mean acceptance probabilities from replica 
exchange, pace, based on all simnlations. 

The observed pace also underscore the point that large 
variation in protocols is not necessarily problematic. 
While simulations of ltz8, Irlh, and Ivcj have a rela¬ 
tively large variation in Nstates between milestones b&c, 
all pace estimated from replica exchange are greater than 
0.9 (Figure 8). As is particularly clear for low tempera¬ 
ture states of Irlh, longer protocols have a higher pace- 
Another bump in pace occurs between the last pair of 
thermodynamic states, where an even interval in ther- 
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Temperature (K) 


large range, from 13.6 to 71.1 hours (Figure 9). In all 
simulations, the majority of time was spent in replica ex¬ 
change between milestones c&d, and the second largest 
fraction in replica exchange between milestones b&c. 
Initialization and analysis between milestones c&d con¬ 
sumes a small but noticeable fraction of simulation time, 
whereas the analogous steps between milestones b&c 
consume a nearly negligible fraction of simulation time. 
There is a trend relating the calculation time affiliated 
with milestones a to c and milestones c to e, but the 
correlation is imperfect. 



FIG. 8. Mean acceptance probabilities for 15 indepen¬ 
dent simulations of the ligand from (a) ltz8, (b) Irlh, and (c) 
Ivcj between milestones b&c. pace is shown as a line connect¬ 
ing neighboring temperatures. 


modynamic length would necessitate going higher than 
the temperature of interest. 


Simulation time 

Due to the heterogeneity of computing resources, vari¬ 
ability of queue times, and the diverse nature of the sys¬ 
tems, calculations took a variable length of time to com¬ 
plete. Some calculations completed in one day, and all 
were done within a week. 

Total CPU time of benchmark simulations spanned a 


FIG. 9. Benchmark simulation times starting from the 
crystallographic pose. In ascending order, lines depict the 
cumulative time for initialization, replica exchange, and anal¬ 
ysis (running GBSA energy and MBAR calculations), first 
for milestones a to c and then milestones c to e. Systems are 
ordered along the x axis by the total simulation time. 

Sampling 

At milestone c, sampled configurations are uniformly 
distributed in a sphere. As a increases from 0 to 1 be¬ 
tween milestones c&d, the phase space of the ligand is 
gradually restricted. First, the soft grids prevent ligand 
atoms from overlapping with receptor atoms. At inter¬ 
mediate values of a, the ligand may assume several poses 
(e.g. Figure 10). Finally, at milestone d, the ligand usu¬ 
ally but does not always sample from a single minimum. 


In many simulations, the phase space sampled at higher a is a subset of that sampled at lower a (e.g. Figure 11). In 
others, however, the conformational minima for a « 0.5, where the soft grids are at full strength, are entirely distinct 
from the minima for a « 1 (e.g. Figure 10). When there are shifts in important phase space, the thermodynamic 
states at a « 0.5 are less beneficial to sampling from milestone d. Nonetheless, phase space overlap between adjacent 
states is a sufficient condition for precise free energy estimates. 
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FIG. 10. Samples from evenly spaced thermodynamic states between milestones c&d, taken from a representative 
simulation of 117f starting from docked poses. 


At milestone d, most simulations sample from a single minimum (Figure 12), but there are several other situations. 
These include, 

1. Sampling several alternate poses that are (e.g. Figure 12(a)) or are not (e.g. Figure 12(d)) clearly separated; 

2. Sampling a clearly defined but sparsely populated alternative pose (e.g. Figure 12(b)); 

3. Sampling alternate poses that share a common warhead position, but have a floppier tail (e.g. Figure 12(g)). 


Native pose identification 

A large fraction of simulations include the crystallo¬ 
graphic pose among samples from milestone d (Figure 
13). Unsurprisingly, it is most often observed in simula¬ 
tions starting from the crystallographic pose. After equi¬ 
libration, all the simulations still sample a pose with a 
heavy-atom RMSD less than 0.75 A. In contrast, a native 
pose (defined as having a heavy-atom RMSD less than 
2.0 A) is observed in 85.6% of simulations starting from 
the lowest energy pose and 77.7% of simulations starting 
from docked poses. According to Ug, the lowest-energy 
observed pose is a native pose in 61/85 = 71.8% of sys¬ 
tems. This implies that Boltzmann sampling was able to 
start at another pose and find the native pose in 13.8% 
of simulations. Starting docked poses include a native 
pose in 75/85 = 88.2%, and rank the native pose with 
the lowest Ug (and lowest UCSF DOCK 6 grid score) in 
48/85 = 56.5% of systems. This implies that the sam¬ 
pling procedure was able to bring the native pose into 
milestone d, via Monte Carlo moves or replica exchange, 
in 21.2% of simulations. 


Overall, the GBSA force field is better at identifying 
the native pose than the grid energy Ug. According to the 
grid energy, the lowest energy configuration has a RMSD 
less than 2.0 A in 78.5% of simulation starting from the 
crystallographic pose, 68.9% of simulations starting from 
the lowest energy pose, and 55.6% of simulations starting 
from docked poses. According to the GBSA force field, 
the analogous fractions are 87.1%, 75.1%, and 61.5% of 
simulations, respectively. 

Estimation 

Standard deviation of free energy estimates 

In most systems, the standard deviation of the BPMF 
converges to within 5 ksT (Figure 14 and Table IV). 
Within the thermodynamic cycle, free energies estimated 
based on multiple intermediate states (/he and fed) were 
more precise, and those estimated by single-step pertur¬ 
bation {fab and fde) were less precise. Simulating from 
multiple intermediate states promotes phase space over¬ 
lap between adjacent states, leading to more precise free 
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FIG. 11. Samples from evenly spaced thermodynamic states between milestones c&d, taken from a representative 
simulation of Ikzk starting from docked poses. The protein structure structure is shown with ribbons and the crystallographic 
ligand pose is shown with a thick licorice representation and purple carbon atoms. The same illustration scheme is used in 
Figures 10 and 12. Figures 11, 10, and 12 were generated with VMD [53]. 



(a) Imzc. docked 


(d) 1 hww, lowest energy 


(e) 1hp0, docked 


(h) 1gm8, lowest energy 


FIG. 12. Samples from milestone d, taken from representative simulations. 


energy estimates. Indeed, cr[/6c] and o-[fcd] are less than 
5 ksT except in two systems with no docked poses in 
the binding site: ln46 and lt46. For these simulations, 
randomly placed ligands end up sampling different local 
For systems with poor convergence of BPMF esti- 


minima in independent simulations and result in distinct 
free energy estimates. Among all free energy differences 
between adjacent milestones, fde is usually the least pre¬ 
cise (Table V). 

mates, longer sampling is not necessarily an effective way 
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Fraction of simulations 


to improve the precision of free energy estimates. While 
a[f,fend\ usually decreases monotonically with sample 
size, this trend is not universally true (Figure 15). In 
some systems there are temporary jumps in a[f,fend] 
due to instability in determining the equilibrated cycle. 
More importantly, in other systems, a[f,fend] appears 
flat, with little or no change with increased sampling. 


FIG. 13. Heavy-atom RMSDs from the native state, for 

samples from milestone d (after equilibration) from all of the 
simulations starting from (a) the crystallographic, (b) docked, 
and (c) lowest-energy observed poses. Symbols denote the 
lowest observed RMSDs in the simulation (x), or RMSDs of 
the lowest-energy pose according to Ug (circles) or the GBSA 
force field (triangles). Each data set is sorted in ascending 
order. To best emphasize native-like poses, the ordinate is 
truncated at 3.0 A. 


In some systems, the variance is low but free energy 
estimates are dependent on the starting poses. When 
results from all simulations between milestones c&d are 
grouped together, the standard deviation in many sys¬ 
tems no longer converges to within 5 k^T. Because con¬ 
verged thermodynamic quantities should not depend on 
initial conditions, this inconsistency implies that some 
simulations are not fully converged. 

When free energy estimates contradict, it is not com¬ 
pletely clear which result is most consistent with the force 
field. False convergence is quantified based on the mini¬ 
mum value, using a[x,Xmin]- A system is considered to 
be falsely converged when a[f] is less than 5 ksT but 
a[f, fmin] is greater than 15 ksT. Based on this defini¬ 
tion, false convergence occurs in a nontrivial fraction of 
systems (Table IV). 

Ultimately, poor convergence and false convergence 
are caused by incomplete sampling. The most straight¬ 
forward explanation for incomplete sampling is that a 
simulation starts and remains trapped in a local min¬ 


imum distinct from the global minimum energy. In¬ 
deed, between milestones c&d, most falsely converged 
calculations starting from the crystal structure (4/5) and 
docked poses (7/11) have no starting pose within 1.0 
A of the lowest-energy observed pose. However, sam¬ 
pling the lowest-energy structure is not a sufficient con¬ 
dition for complete sampling, as other poses may have 
a lower free energy. For simulations starting from the 
lowest-energy observed pose, four systems appear falsely 
converged between milestones c&d (formatted by PDB 
identifier (ct[/], a[f, fmin]) with cr in units of ksT): Ihww 
(0.353, 16.1), Irlh (3.95, 15.3), IhpO (0.496, 16.7), IgmS 
(0.422, 15.6). For Ihww (Figure 12(c) and (d)) and IgmS 
(Figure 12(g) and (h)), starting from the crystallographic 
pose leads to a lower estimate of fed than the lowest- 
energy pose. For IhpO (Figure 12(e) and (f)), a docked 
pose has the lowest estimated fed- Irlh does not appear 
to sample different poses; the large cr[/, fmin] is the result 
of an outlier with a significantly lower fed estimate than 
average. 
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FIG. 14. Standard deviation of free energy estimates for (a) 15 independent simulations of the ligand or of the complex 
starting from (b) the crystallographic, (c) docked, and (d) lowest-energy observed poses. Panel (a) is ordered by the standard 
deviation of the free energy difference between milestones a&c. The other panels are ordered by the standard deviation of the 
free energy difference between milestones a&e. 


DISCUSSION 

The selection of intermediate states between two ther¬ 
modynamic milestones of interest is a ubiquitous problem 
in molecular simulation. The usual approach is an iter¬ 
ative trial-and-error process starting from a naive proto¬ 
col, checking for issues such as replica exchange bottle¬ 
necks, and inserting and removing states as necessary. I 
have developed a simple and robust approach to initialize 
a series of thermodynamic states based on only a single 
adjustable parameter, the thermodynamic speed. Due 
to this automated procedure, I was able to run a large 
number of simulations on a diverse array of protein-ligand 
complexes. In the vast majority of cases, the protocols 
yielded consistent replica exchange rates across neighbor¬ 
ing thermodynamic states without further hne-tuning. 
The described approach to trailblazing thermodynamic 
state space may find use in other classes of simulations. 

Replica exchange calculations in this present study in¬ 
clude more thermodynamic states than most published 
molecular simulations. Conventional wisdom about 
replica exchange is that an optimal number of replicas 
will maximize efficiency. With too few replicas, there is 


limited phase space overlap between neighbors and ex¬ 
change rates are vanishingly small. With too many repli¬ 
cas, metrics of replica exchange efficiency, such as a mean 
round-trip time, diminish. However, in a recent study in¬ 
volving extensive simulation of several distinct processes, 
it was found that if there are no bottlenecks, the num¬ 
ber of states has little impact on the convergence of free 
energy estimates (the manuscript is under preparation). 
Hence, I chose to include a large number of states to al¬ 
low for the possibility that later sampling will explore 
different regions of phase space and reduce the exchange 
rate between neighboring states. 

While useful, the described thermodynamic state ini¬ 
tialization process remains imperfect. Replica exchange 
is particularly beneficial when the important phase space 
of a thermodynamic states is a subset of the important 
phase space of another. The present procedure is limited 
to the variation of a single thermodynamic parameter be¬ 
tween milestones, and can do little to promote the sub¬ 
set relationship. Future improvements could accomodate 
varying multiple parameters (e.g. separate parameters 
for the temperture, van der Waals grids, and electrostatic 
grids) in between thermodynamic states of interest, while 
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Replica exchange cycle 


Replica exchange cycle 


FIG. 15. Standard deviation of free energy estimates as a function of cycle: a[fac,fac,erLd] based on 15 independent 
simulations of (a) the ligand or clfae, fae,end] based on the ligand simulations and 15 independent simulations of the complex 
starting from the (b) crystallographic, (c) docked, and (d) lowest-energy observed poses. 


maintaining a phase space subset relationship. 

While a direct comparison is difficult, the described 
procedure appears less successful than the best docking 
programs at identifying the native pose. Hartshorn et al. 
[29] found that with a larger site (10 A), the top-ranked 
GOLD solution is within 2 A of the experimental binding 
mode in 80.4% of calculations. In contrast, only 56.5% of 
the present UCSF DOCK 6 calculations were successful, 
suggesting that setup and parameterization, and perhaps 
the docking algorithm, could be fine-tuned. Native pose 
identification based on AlGDock performs similarly to 
UCSF DOCK 6, suggesting that replica exchange sam¬ 
pling from the Boltzmann distribution does not signifi¬ 
cantly increase the diversity of poses. As evidenced by 
the variable performance as a function of starting pose(s), 
improving Boltzmann sampling schemes should improve 
native pose identification. 

Nonetheless, the sampling and estimation approach is 
able to attain precise BPMF estimates in the majority 
of systems. As for the others, it is unsurprising that the 
same approach does not work equally well in all cases. 
With a diverse set, ligands and receptors will exhibit dis¬ 
tinct levels of complexity and may be driven to bind by 
different factors. Indeed, the inadequate sampling that 


led to convergence issues for several systems is probably 
not unique to BPMF calculations [11]. Because of the 
relative simplicity of the calculations, I have been able 
to conduct more replicates from more distinct starting 
points than other alchemical standard binding free en¬ 
ergy calculations. The sheer number of simulations has 
allowed us to expose problems that may not be clear from 
fewer calculations. Our results suggest that Hamiltonian 
replica exchange with molecular dynamics is not always 
an adequate approach for sampling distinct ligand bind¬ 
ing poses. Specialized sampling methods will likely be re¬ 
quired in a more broadly effective protein-ligand BPMF 
estimation procedure. It is also clear that for some sys¬ 
tems, there may be limited phase space overlap between 
different implicit solvent models. More precise BPMF 
estimation can be attained by introducing intermediate 
states or implementing more similar force fields for mile¬ 
stones a&b and d&e. 


CONCLUSIONS 

I have developed a method to initialize thermodynamic 
states and estimate BPMFs for protein-ligand systems. 
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TABLE IV. Number of converged systems 


Milestones 

Starting point 

< 1.5 fcsT 

< 5.0 ksT 

False 

a&b 

- 

69 

82 

- 

b&c 

- 

82 

85 

- 

a&c 

- 

64 

80 

- 

c&d 

xtal 

84 

85 

5 

c&d 

dock 

82 

83 

9 

c&d 

lowest Ug 

77 

85 

4 

c&d 

all 

48 

71 

- 

d&e 

xtal 

34 

76 

4 

d&e 

dock 

26 

73 

8 

d&e 

lowest Ug 

37 

80 

11 

d&e 

all 

17 

63 

- 

a&e 

xtal 

28 

74 

6 

a&e 

dock 

20 

68 

11 

a&e 

lowest Ug 

27 

77 

12 

a&e 

all 

15 

52 

- 


Number of systems (out of 85) in which the free energy 
difference has a standard deviation of less than 1.5 or less 
than 5.0 ksT. False convergence means that the system 
a[x] < 5.0 ksT, but cr[x,Xmin] > 15.0 ksT. For the xtal, 
dock, and lowest Ug starting points, the standard deviation 
is based on 15 independent simulations. The starting point 
of ‘air combines data from the other starting points, and is 
thus based on 45 independent simulations. 


TABLE V. Least precise free energy components 


Starting point 

Subset 

a&b 

b&c 

c&d 

d&e 

xtal 

all 

11 

0 

5 

69 

dock 

all 

10 

1 

5 

69 

lowest Ug 

all 

12 

1 

9 

63 

all 

all 

6 

0 

20 

59 

xtal 

imprecise 

5 

0 

0 

10 

dock 

imprecise 

3 

0 

2 

18 

lowest Ug 

imprecise 

4 

0 

1 

13 

all 

imprecise 

3 

0 

11 

25 


The least precise free energy estimates between adjacent 
milestones. Counts are based on all simulations or the 
imprecise subset, when the standard deviation of the total 
BPMF is greater than 5.0. 


The largest sources of imprecision are found to be ligand 
pose sampling and phase space overlap between implicit 
solvent models. Intriguingly, I found that starting from 
the lowest-energy pose is not a sufficient condition for 
converged BPMF estimates. 

Future studies may build on the current work to com¬ 
pare BPMFs with molecular docking scores as classifiers 
of binding. Another useful direction will be to assess the 
convergence of standard binding free energy estimates 


upon increased receptor conformational sampling. 
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